Brain connectomes in youth at risk for serious mental illness: an exploratory analysis

Background Identifying early biomarkers of serious mental illness (SMI)—such as changes in brain structure and function—can aid in early diagnosis and treatment. Whole brain structural and functional connectomes were investigated in youth at risk for SMI. Methods Participants were classified as healthy controls (HC; n = 33), familial risk for serious mental illness (stage 0; n = 31), mild symptoms (stage 1a; n = 37), attenuated syndromes (stage 1b; n = 61), or discrete disorder (transition; n = 9) based on clinical assessments. Imaging data was collected from two sites. Graph-theory based analysis was performed on the connectivity matrix constructed from whole-brain white matter fibers derived from constrained spherical deconvolution of the diffusion tensor imaging (DTI) scans, and from the correlations between brain regions measured with resting state functional magnetic resonance imaging (fMRI) data. Results Linear mixed effects analysis and analysis of covariance revealed no significant differences between groups in global or nodal metrics after correction for multiple comparisons. A follow up machine learning analysis broadly supported the findings. Several non-overlapping frontal and temporal network differences were identified in the structural and functional connectomes before corrections. Conclusions Results suggest significant brain connectome changes in youth at transdiagnostic risk may not be evident before illness onset. Supplementary Information The online version contains supplementary material available at 10.1186/s12888-022-04118-4.

functional connectome [2], decreased network efficiency, and disrupted small-worldness [3] are predictive of transition to psychosis in CHR. However, early prodromal phase symptoms are not well-distinguished, and CHR studies do not consider the range of phenotypic and functional outcomes that may be present in early phase of illnesses [4][5][6]. Therefore, there is increasing consensus that broader transdiagnostic approaches may be more suitable to investigate risk for SMI [7].
Second, studies have often focused on specific tracts or regions of interest [8]. Given that most SMI involve disrupted communication involving several brain regions [9], a far more useful approach may be to investigate whole brain connectivity networks. Aberrant connectivity has been observed in almost all major mental disorders, and disruption in these circuits often results in susceptibility to broad domains of psychopathology rather than discrete disorders, providing further support for the use of transdiagnostic over discrete disorder models [10].
Finally, no studies so far have investigated both structural as well as functional connectivity in the same sample of individuals at transdiagnostic risk, despite evidence that simultaneous analyses of functional and structural networks may provide complementary insights into brain organization for psychopathology [11].
In the current study, we used a closely matched data analysis pipeline to investigate whole brain structural and functional connectomes in a sample of individuals at transdiagnostic risk for SMI compared to controls. We followed this up with a machine learning (ML) analysis of the data to determine whether linear support-vector machine (SVM) analysis can identify combinations of features which may help distinguish between the groups. As this is the first study of its kind, we abstained from proposing specific hypotheses as there are no previous studies on which to base our predictions. Most previous neuroimaging studies have either conceptualized transdiagnostic risk differently (e.g., [12]) or focused on CHR classification. However, CHR findings cannot be used as hypotheses for transdiagnostic studies owing to the significant difference between the two approaches as well as differences in the conditions to which participants transition.

Participants
Participants for the current study were recruited from the larger Canadian Psychiatric Risk and Outcome study (PROCAN), which investigates youth at risk for SMI and consists of participants from the University of Calgary and Sunnybrook Health Sciences Centre [13,14]. Participants were included in the study if they were 12-25 years of age, had an IQ > 70, did not meet criteria (at baseline) for a SMI or any medical condition affecting the central nervous system, and met one of the staging criteria [15]. To determine clinical stage assignment, a consensus based decision-making process was used. Participants who presented with familial risk factors (e.g., having a first degree relative with a psychiatric condition) but were asymptomatic were categorized as stage 0. Participants who presented with mild anxiety or depressive symptoms were classified as stage 1a, and participants who presented with attenuated syndromes were classified as stage 1b. The terms stage 0, stage 1a, and stage 1b will be used here to refer to familial risk for serious mental illness, mild symptoms, and attenuated syndromes, respectively. A group of healthy controls (HC) with no personal or family history of mental illness was also recruited for comparison. None of the healthy controls or stage 0 participants met any of the criteria for stages 1a and 1b. All participants were monitored over time to determine transition to SMI. Participants who went on to meet criteria for a SMI during a 12-months follow-up period were put into the transition group for analysis, instead of the group based on baseline symptom level (Table 1; Supplementary Table 1).
Participant and imaging details have been described previously [13,14,16]. Briefly, participants were recruited from 2 sites-University of Calgary (Calgary) and Sunnybrook Health Sciences Centre (Toronto) from the larger PROCAN project. Of the 243 participants in PROCAN, 11 participants transitioned to a SMI over 12 months (all from the Calgary site): 10 participants met criteria for major depressive disorder (MDD) and 1 participant met criteria for bipolar disorder.
Imaging data was available for 173 participants, including 9 transition participants (all of whom met criteria for MDD and all of whom were female). Two participants were excluded for poor quality data, bringing the final sample for analysis in the current study to 171 participants (140 from University of Calgary and 31 from Sunnybrook Health Sciences Centre). For demographic and clinical details of all participants included in the current paper, see Table 2.

Measures
Assessment measures have been described in detail elsewhere [13,14] (Supplementary Table 1). Briefly, measures used to determine stage of risk included: Structured Interview for Psychosis Risk Syndromes [17], Scale of Psychosis-Risk Symptoms (SOPS) [18], Kessler 10 Distress Scale [19], and Quick Inventory of Depressive Symptoms [20]. The Structured Clinical Interview for DSM-5 (SCID-5) [21] was used to confirm transition to a SMI by 12 months. Transition to MDD was defined here as the presence of more than one major depressive episode.

MRI acquisition
Complete image acquisition details including scanner model and software version, coil details, software used, and standardized scanning protocols have been described previously [22] and are described in brief here.
Imaging data were acquired on a GE 3.0 T Discovery MR750 (University of Calgary) or Philips 3.0 T Achieva scanner (Sunnybrook Health Sciences Centre). To minimize scanner differences, reference protocols were established for each site and scanner type. All data was visually inspected by expert quality-control raters, and any scans with artifacts were removed before analysis.

Structural imaging
Diffusion imaging data were acquired using single shot spin echo echo-planar imaging sequence on a GE 3.0 T Discovery MR750 (University of Calgary) or Philips 3.0 T Achieva scanner (Sunnybrook Health Sciences Centre). Diffusion sensitizing gradients (b = 1000 s/ mm 2 ) were applied along 45 (University of Calgary) and 30 (Sunnybrook Health Sciences Centre) noncollinear directions. Eight images without diffusion weighting (b = 0 s/mm 2 ) were acquired. Isotropic 2.2 mm voxels were acquired (resampled to 0.86 mm in-plane), FOV = 220 × 220 mm, matrix = 256 × 256, TR = 8000 ms, TE = 94 ms, flip angle = 90°, anterior-posterior phase encoding. Both sites used image space reconstruction [GE ASSET (University of Calgary) and Philips SENSE (Sunnybrook Health Sciences Centre)]. Total acquisition time was 7 min and 12 s at University of Calgary and 5 min and 15 s at Sunnybrook Health Sciences Centre.

Functional imaging
For the University of Calgary site, the fMRI acquisition used gradient echo EPI with the following parameters:

Structural connectome analysis
DTI data was visually checked using FSL [23] and ExploreDTI v4.8.6 [24]. Data was pre-processed in ExploreDTI to correct for subject motion and eddy current distortions, with diffusion vectors rotated as required and automatic background masking applied [25]. Two participants were excluded owing to poor quality data (lack of fully connected DTI data) before arriving at the current sample of 171. Tractography analyses was run in ExploreDTI. Automated whole brain constrained spherical deconvolution (CSD) was performed using a white matter mask derived from diffusion data for each participant in native space. The minimum fractional anisotropy (FA) threshold was set to 0.20 to initiate and continue tracking, and the angle threshold was set to 30°.
In order to build the individual DTI-based structural connectivity matrices, the Automated Anatomical Labeling (AAL) template from MRIcron [26] was used to subdivide the brain into 90 regions excluding the cerebellum [27,28]. The AAL template and whole-brain fiber tractography were used as inputs to create a 90 × 90 regionwise connectivity matrix for each individual with the "PASS" option, which means 2 regions are considered to be connected even if a third region passed through [29]. Each element of the matrix contained the averaged FA value within the connected fiber tracts between regions and was set to zero if there was no connection [30]. This weighted connectivity matrix was binarized for the calculations of the graph theoretical metrics.

Functional connectome analysis
All T1-weighted structural images and resting state fMRI (rs-fMRI) scans were visually examined for artifacts or distortions prior to processing. The data were processed using AFNI, FSL and REST [23,31,32]. The T1 images from each participant were skull stripped and co-registered to their fMRI images prior to being parcellated into grey matter, white matter, and cerebrospinal fluid (CSF). The first five volumes from each of the rs-fMRI scans were removed to ensure signal stabilization, leaving a total of 295 volumes. The resting state scans then underwent correction for slice timing and head movement. The average relative framewise displacement (FD) was calculated for each participant [33]. Furthermore, as per [34], a 36-parameter matrix was generated from each participant's rs-fMRI data. This matrix included the averaged signals from the individual whole brain mask, CSF mask, white matter mask, the six head motion parameters, and their temporal derivatives and quadratic term signals. Then a spike matrix was created using any volumes with a high FD (> 0.3 mm) [35]. These two matrices were combined, and their effects were regressed out of the rs-fMRI data. The rs-fMRI scans were then normalized to the MNI152 2009a nonlinear symmetric atlas (https:// www. bic. mni. mcgill. ca/ Servi cesAt lases/ ICBM1 52NLi n2009), band pass filtered between 0.009 and 0.08 Hz, linear trends were removed, and finally the scans were smoothed using a 4 mm full width at half max Gaussian kernel. Slice timing, head motion correction, T1-weighted image segmentation, head motion outlier detection, coregistration, and spatial normalization and smoothing were done in FSL 6.0.3 [23]. Regression of the nuisance signals, band-pass filtering, and linear trend removal were done using AFNI version 18.0.13 [31].
The same AAL template [27,28] that was employed for the DTI analysis was used to parcellate the rs-fMRI images into the 90 × 90 region-wise connectivity matrices used in the graph theory analyses. Each element of the raw connectivity matrix contained the averaged correlation value of the blood oxygenation level dependent (BOLD) signal fluctuations between regions. For this analysis, all negative correlations were set to zero and positive correlations were thresholded using a p-value of p < 0.05. All positive correlations weaker than the threshold value were set to zero, and all positive correlations greater than, or equal to, the threshold were set to one. All connectivity matrices were fully connected.

Global metrics
Assortativity is the tendency of nodes to link those nodes with similar number of edges. Hierarchy coefficient identifies the presence of hierarchical organization in a network.
Small-worldness refers to the property of combining high levels of local clustering among nodes of a network and short paths that globally link all nodes of a network. Small-worldness analysis [39] used 100 randomly-generated networks: Clustering coefficient measures the extent of local clustering of a network; γ measures the ratio between the real clustering coefficient and that of the random networks; λ measures the ratio between the real shortest path length and that of the random networks; Shortest path length quantifies the mean distance or routing efficiency between a node and all the other nodes in the network; σ is the ratio between λ and γ.
Synchronization measures the likelihood that all nodes fluctuate in the same wave pattern. Global efficiency measures the efficiency of information propagation through the whole network. Local efficiency assesses the efficiency of information propagation over a node's direct neighbors.
Intensity measures the mean strength of the connectivity matrix by averaging all elements in the weighted matrix. For the functional analysis, this done with averaging only the positive connections that were included in the connectome.
Density is the ratio between the number of existing edges in the connectome and the size of the matrix.

Modular interactions
Modularity indicates the extent to which a network is organized into modules or communities with dense connections within them but sparse connections between them.

Nodal metrics
Betweenness centrality of a node measures its effect on information flow between other nodes. Degree centrality measures the number of the connections directly to a node. Nodal clustering coefficient refers to the extent of interconnectivity among neighbors of an index node. Nodal efficiency indicates how efficiently an index node communicates with the other nodes. Nodal local efficiency measures how efficient the communication is among the first neighbors of a node when it is removed. Participant coefficient is the ability of an index node to keep the communication between its own module and other modules. In this study we use the normalized participant coefficient, which corrects for the effects of the number of modules.
The whole-brain averaged metrics are the mean of each metric across all nodes of the graph.

Machine learning analysis
Traditional supervised ML techniques were used to analyze the combined structural and functional data. The structural and functional connectivity data was combined into a single, tabular dataset consisting of 1359 features from 171 total participants ( Supplementary  Fig. 1). These features consisted of the connectivity metrics noted above from both the structural and functional imaging data in addition to covariates age, head movement, and imaging site.
Traditional ML techniques were used due to their interpretability and performance on datasets with a limited number of samples. We selected a linear (SVM) classifier as our model. Multi-class linear SVM classifiers define a set of hyperplanes in feature space which maximizes the margin between the various class labels. A multi-class linear SVM model classifies a given data point by selecting the class whose decision boundary hyperplane is furthest from the data point within feature space. Linear SVM remain effective in high-dimensionality feature spaces and have been used to classify connectivity data in past studies [42][43][44][45]. A one-verses-rest multiclass strategy was employed, with one model trained per category label.
We employed a leave-one-out (LOO) cross-validation methodology to train and test our models. In LOO cross-validation, the model is trained on the entire dataset except a single hold-out test sample. Each sample is tested once and the model is evaluated by considering the accuracy of the model across all folds. The data preprocessing and hyperparameter tuning was performed within each cross-validation fold to ensure that transformation of the training data was not biased by the test sample.
The dataset labels were highly imbalanced, owing to differences in group size. To address this imbalance, we upscaled under-represented categories during training such that each category had an equal number of samples. After upscaling, we applied a Yeo-Johnson power transform [46] to remove skewness and transform the input data into more Gaussian-like distributions. A z-score standard scaler was also applied following the power transform to normalize the data to a zero mean with unit variance.
Due to the high dimensionality of the data compared to the number of samples available, we pre-fit a linear SVM classifier model to the training data to perform feature selection. We performed feature selection using a slightly modified normal-based criterion as described in [47].
Features were selected based on the L1 norm of the feature weights from each label's model divided by the L1 norm of feature weights across all features and labels to obtain a normalized weight per feature. We selected any features with weights greater than or equal to 0.1% of the sum of feature weights for our hyperparameter tuning, training, and validation steps. Since the feature weights vector describes the normal of the hyperplane decision boundaries, normal-based feature selection selects those features which significantly influence the width of the margin between hyperplanes and their associated support vectors.
A hyperparameter grid search was performed for each fold to select the optimal regularization parameter. A regularization parameter (C) of 0.1 was selected in 130 out of 171 folds, providing high regularization strength.
The source code used for our machine learning analysis is publicly available [48].

Statistical analysis
Data analysis procedures were similar for both structural and functional data. Data was analyzed using IBM SPSS version 26 [49] and the GRETNA toolbox [36]. A combination of linear mixed effects (LME) and analysis of covariance (ANCOVA) procedures was used to test group differences on global and nodal metrics, controlling for site and age. LME was used for global metrics and modular interactions as it provides better control for site, while ANCOVA was used for nodal metrics owing to the large number of comparisons. As there was no variability  For all global metrics and modular interactions, LME analysis were run with site as a random variable throughout the analysis. For each metric, three models were developed, with factors added at every stage. Model improvement was tested using chi-square tests of -2 log-likelihood (-2LL) values estimated using maximum likelihood (ML). The initial model with no predictors (model 0) was compared to the model with age as a fixed factor (model 1) to observe whether global metrics changed as a function of age for the entire sample. After this, group and age were added as fixed factors in model 2, and model improvement was tested again (in the functional connectivity analysis, head motion, which was quantified using the relative estimated mean displacement from FSL MCFLIRT, was also included along with age in models 1 and 2 as a fixed factor). Uncorrected values are reported, but results were only interpreted if they were significant after false discovery rate (FDR) correction for multiple comparisons.
To examine separate brain networks (or modules) [28], the 90 AAL regions were clustered based on the seven networks described in Yeo et al. [50]: the visual, somatomotor, dorsal attention, ventral attention, limbic, frontoparietal, and the default mode networks. The bilateral caudate, putamen, pallidum, and thalamus were clustered to form an eighth network named the deep gray matter network [51]. Modular interactions (i.e., total number of edges) between (28 comparisons) and within (8 comparisons) these networks, and the participation coefficient based on the network parcellation, were calculated and compared between groups with LME analysis as previously described. Uncorrected values are reported, but results were only interpreted after FDR correction for multiple comparisons.
For structural and functional nodal metrics, analysis of covariance (ANCOVA) with FDR correction was preferred over LME owing to the large number of variables (90 brain regions) per comparisons. Age and site were included as covariates; head motion was also included as a covariate for functional connectivity analysis. Uncorrected values are reported, but results were only interpreted if they were significant after FDR correction.

Participant characteristics
Socio-demographic information for all participants is provided in Table 2. There were significant differences in age between groups. There were also differences in education, though this can be explained by the age differences (younger participants have fewer years of education).

Structural connectivity
There were no significant differences between the groups on averaged connectome intensity and connectome density (Tables 3 and 4; Fig. 1a). Tables 3 and 4 show the group comparisons on wholebrain averaged metrics. There were no significant group differences on any global metrics before correction for multiple comparisons (Tables depict uncorrected values; also see Fig. 2a). Modular interactions showed a significant effect for connections between the visual and ventral attention networks, with stage 1b having lower interaction when compared to healthy controls (p = 0.05; Supplementary Table 2a). However, none of these effects survived FDR corrections. For nodal results (Table 5), several frontal regions (including the right dorsolateral superior frontal gyrus, left middle frontal gyrus, left superior medial orbital frontal gyrus, and right anterior cingulate) and some parietal and temporal regions (including the left angular gyrus) showed differences on nodal metrics ( Supplementary Fig. 2). Participant coefficient was the most prominent metric with respect to group differences. However, none of these effects survived FDR corrections.

Functional connectivity
No participants were removed from the analysis due to excessive motion and there were no significant differences between the stages of risk in terms of number of spikes that were censored (F (4,166) = 1.17, p > 0.30).
In the functional connectome, there were no significant differences between groups in averaged connectome density or connectome intensity (Fig. 1b). Figure 2b and Tables 6 and 7 show the group comparisons on whole-brain averaged metrics from the functional imaging data. In the whole-brain averaged metrics, there were no significant group differences on any of the global metrics.
There were some differences in the modular interactions from the functional analysis (all p = 0.05; Supplementary  Table 2b). Specifically, group differences were found in the connection between the somatomotor network and the dorsal attention network, as well as the within module connections in the dorsal attention and limbic networks. None of these effects survived FDR corrections.
For nodal results, several frontal (left inferior orbitofrontal gyrus, left cingulum, and bilateral precentral gyri) and temporal regions (bilateral superior temporal poles, right parahippocampal gyrus, right superior gyrus, and right Heschl's gyrus), as well as the right postcentral gyrus of the parietal lobe, showed differences on nodal metrics (Table 8). Participant coefficient was the most Fig. 1 Binarized connectivity matrices from each stage of SMI risk depicted on the Montreal Neurological Institute (MNI152) brain template. Red dots represent nodes from the AAL template, and blue edges represent white matter connections between nodes for the structural analysis, and correlations that survived thresholding in the functional analysis. All figures were made using BrainNet Viewer (http:// www. nitrc. org/ proje cts/ bnv/) [52]. a Structural connectivity. b Functional Connectivity prominent metric with respect to group differences. However, none of these effects survived FDR corrections.

Machine learning analysis
Our LOO cross-validation methodology trained and tested 171 models, one for each fold. The Linear SVM classifier had an overall accuracy and f1-score of 33.91% and 33.76%, respectively. A confusion matrix comparing true labels to predicted labels is depicted in Fig. 3. On average, 251 of 1359 features were selected for each fold. These features had weights greater than or equal to 0.1% of the L1 norm of feature weights.
Our modified normal-based feature selection methodology was used to select features with high relative importance. The average feature importance was calculated over all classes and all cross-validation folds to determine the most important features. The top 10 most important features are depicted in Fig. 4. Similar to structural and functional findings, global metrics were not identified as significant features, while participant coefficient nodal metrics comprised 7 out of the top 10 most important features. However, only some features overlap with regions identified in structural (left angular gyrus participant coefficient) and functional findings (dorsal attention network).

Structural connectivity
Here, in the first study of structural connectome in transdiagnostic risk, we found similar structural brain connectome profiles across different stages of transdiagnostic risk for SMI. However, uncorrected results may be interpreted cautiously to suggest the presence of some modular interaction differences between groups. The results showed an uncorrected significant effect for connections between the visual and ventral attention networks, with stage 1b having lower interaction than healthy controls. Similarly, nodal results may suggest several frontal regions (including the right dorsolateral superior frontal gyrus, left middle frontal gyrus, left superior medial orbital frontal gyrus, and right anterior cingulate) show between-group differences in the sample. While conclusions cannot be drawn owing to lack of significance after correction, the findings suggest specific networks and brain regions which may be the focus of future investigations.    Table 5 Structural nodes with significant group differences (p < 0.05, uncorrected) on graph theoretical measures (also see Fig. 3a Previous studies of MDD generally point to decreased structural connectivity, especially within frontal-subcortical networks and the default mode network [9]. One study reported reduced global efficiency and increased path length in patients with remitted geriatric depression. Depressed patients also had nodal differences from controls in several frontal regions [53]. A large study of MDD outpatients [54] found lower structural connectivity in the default mode network as well as the frontalthalamo-caudate regions compared to controls. A third study using support vector machine based classification found that small-worldness was the most useful graph metric for classification between MDD and healthy controls [44]. They also reported degree centrality differences in the right inferior parietal cortex, and right pars orbitalis and left rostral anterior cingulate in MDD. Other studies have reported structural abnormalities in white matter regions that link prefrontal cognitive control areas with subcortical emotion processing regions [55]. On the other hand, a small number of studies of MDD have failed to find connectivity alterations [11,56] or abnormalities in global connectivity metrics [54,57] in MDD. Our findings suggest subtle alterations, but none survived multiple comparison correction. It is possible that transdiagnostic risk is characterized by subtle differences in frontal networks which become more prominent following pathological changes after illness onset.

Functional connectivity
Our functional connectome results matched the structural connectome results, in that all groups showed similar profiles. Furthermore, like the structural results, the functional connectivity analysis showed group differences in modular interactions between the somatosensory network and the dorsal attention network, as well as within the limbic network and the dorsal attention network before multiple comparison correction. Pairwise comparisons between groups at differing stages of risk did not show significant differences, however, limiting our ability to draw conclusions about the cause of group differences. In the nodal results, prior to correction for multiple comparisons, group differences were found in several frontal and temporal regions, as well as the parietal region.
These findings are in-line with previous graph theoretic investigations of MDD and psychosis. A recent graph theory study using resting state fMRI data found that functional connectivity in healthy controls and unmedicated participants in their first depressive episode exhibited a similar small world regime, but differed on nodal properties in several regions including the right hippocampus [58]. Another recent study found altered nodal properties of several brain regions in MDD, including bilateral anterior cingulate, right hippocampus, and bilateral middle temporal gyri [59], which overlap with, or are adjacent to some of the regions identified in our results. Furthermore, a study examining graph theory metrics in first episode psychosis found no differences between healthy controls and those in their first episode of psychosis at baseline or at a 12 month follow-up visit [60], suggesting that the differences in graph theory properties may be difficult to detect in early phases of illness. Changes to structural metrics have been found to precede changes to functional metrics in MDD [61], so a similar process may spare functional connectome measures until later in psychosis as well.

Machine learning
The selected linear SVM classification model was unable to effectively discriminate between the participant categories significantly beyond a random-draw baseline. The high regularization selected by the hyperparameter search suggested that the features were relatively noisy in predicting the category of each data point. However, the selection of only 18.5% of features on average as significant and the high importance rank of features derived from the participant coefficient nodal metrics suggests  . 3 Confusion matrix of true vs. labels predicted by the linear SVM classifier using leave-one-out cross-validation that these features may be of interest in future studies. Further work with a larger dataset may validate these findings.

General discussion
The structural and functional connectome analyses used in this study both found similar structural and functional brain connectome profiles across different stages of transdiagnostic risk. The study did not find significant group differences on any of the global and nodal metrics, or modular interactions after corrections, suggesting that changes to brain structure and function may not be prominent during the at-risk phase. Uncorrected results may be interpreted cautiously to guide future research, as they suggest that subtle changes occur in frontal and attention networks in those at transdiagnostic SMI risk. Functional connectivity results additionally implicate temporal regions and suggests a possible role for the limbic network in transdiagnostic risk. While the results do not survive corrections and require validation from future studies, the differences between structural and functional findings also provide support for the view that while structural and functional networks may share similar topological mechanisms [9,62,63] functional connectivity changes may not be entirely constrained by differences in underlying structural connectivity [63,64] making combined connectome approaches a valuable tool in identifying neurophysiological changes in individuals who go on to develop SMI.

Limitations
The primary limitation of this study is the lack of power owing to the small number of individuals who transitioned to a SMI. Declining transition rates are a common problem in studies of at-risk individuals. Though numbers are lacking from transdiagnostic studies, evidence from CHR suggests that current transition rates are about 15% [65]. Despite this limitation, transdiagnostic studies remain critical to identify risk biomarkers in vulnerable populations. Future studies may benefit from multi-site collaborations or sample enrichment from studies using similar acquisition protocols (see [66]). A second limitation of the study is the possibility of scanner differences between sites. While we used standardized acquisition protocols and statistically controlled for site using multilevel modeling, we acknowledge that scanner-induced discrepancies may still confound results.
In this study, we chose to binarize the structural and functional connectome networks, instead of using weighted networks, even though weighted networks preserve biological information better than binarization [67]. Binarized networks are relatively unaffected by connectivity strength, which allows us to directly compare information derived from structural and functional metrics. Furthermore, binarization allows us to mitigate the effect of site differences on the data, and also makes our findings directly comparable to previous studies which have used binarized networks. Future studies incorporating weighted networks into graph-theoretical analysis may help further elucidate relationships between brain networks and risks for mental illness.
Finally, follow-up for this study was limited to one year. It is likely that some of these individuals will transition to a SMI as time goes on. Future studies with longer followup may be better able to elucidate connectome changes associated with longer term transitions.
With respect to SVM, the relatively small sample size, unbalanced dataset, and high dimensional feature space adversely affected the linear SVM model performance. Increasing the sample size, particularly for the transition category, may yield improved results using similar machine learning methodologies.